12/03/2019

Motivation

Standard Approach

To make statistical problems more tractable:



\(\cdot\) Common to pool data eg. spatially



\(\cdot\) Partition a region eg. consider region by region

Climate example

National Resource Management Regions

CSIRO and Bureau of Meteorology, 2015. Climate change in Australia information for Australia's natural resource management regions: Technical report.

Post-processing example


Whan, Kirien, and Maurice Schmeits. "Comparing area probability forecasts of (extreme) local precipitation using parametric and machine learning statistical postprocessing methods." Monthly Weather Review 146.11 (2018): 3651-3673.

Question






How should partition regions for the analysis of extremes?

Application

Create regions that are likely to experience similar impacts

Regionalisation

These regions can then inform our statistical analysis

Outline

1. Regionalisation

  • Clustering
  • Dependence of bivariate extremes
  • Practicalities
  • Classification

2. Visualise spatial dependence

  • Max-stable processes

Regionalisation

Clustering Distance



Require: Measure of closeness between two locations


Want: Form clusters based on extremal dependence


Solution: The F-madogram distance





Bernard, Elsa, et al. "Clustering of maxima: Spatial dependencies among heavy rainfall in France." Journal of Climate 26.20 (2013): 7929-7937.

F-madogram distance

\[d(x_i, x_j) = \tfrac{1}{2} \mathbb{E} \left[ \left| F_i(M_{x_i}) - F_j(M_{x_j})) \right| \right]\] where \(M_{x_i}\) is the annual maximum rainfall at location \(x_i \in \mathbb{R}^2\) and \(F_i\) is the distribution function of \(M_{x_i}\).


Advantages:

  • Only use the raw block (annual) maxima
  • No information about climate or topography
  • Non-parametric estimation (fast)


Cooley, D., Naveau, P. and Poncet, P., 2006. Variograms for spatial max-stable random fields. In Dependence in probability and statistics (pp. 373-390). Springer, New York, NY.

Extremal Coefficient

For \(M_{x_i}\) and \(M_{x_j}\) with common GEV marginals is \[\mathbb{P}\left( M_{x_i} \leq z, M_{x_j} \leq z \right) = \left[\mathbb{P}(M_{x_i}\leq z)\mathbb{P}(M_{x_i}\leq z)) \right]^{\tfrac{1}{2}\theta(x_i - x_j)}. %= \exp\left(\dfrac{-\theta(h)}{z}\right),\] where \(\theta(x_i - x_j)\) is the extremal coefficient and the range of \(\theta(x_i - x_j)\) is \([1 , 2]\).

Can express the F-madogram as: where \[d(x_i, x_j) = \dfrac{\theta(x_i - x_j) - 1}{2(\theta(x_i - x_j) + 1)},\] so the range of \(d(x_i, x_j)\) is \([0 , 1/6]\).

Clustering



\(\checkmark\) Distance




\(?\) Algorithm

K-Medoids Clustering and PAM

  1. Randomly select an initial set of \(K\) stations. These are the set of the initial medoids.
  2. Assign each station, \(x_i\), to its closest medoid, \(m_k\), based on the F-madogram distance.
  3. For each cluster, \(C_k\), update the medoid according to \[m_k = \mathop{\mathrm{argmin}}\limits_{x_i \in C_k} \sum_{x_j \in C_k} d(x_i, x_j).\]
  4. Repeat steps 2. – 4. until the medoids are no longer updated.


Kaufman, L. and Rousseeuw, P.J., 1990. Partitioning around medoids (PAM). Finding groups in data: an introduction to cluster analysis, pp.68-125.

Result

Example

Consider the \(\max \{ \| x_i - x_j \|, 2\}\) as the clustering distance.

Density example

Gridded data

  • Ensure there are sufficient medoids


  • Spatial density is changed by land-sea and domain boundaries


  • Tendancy toward clusters of equal size


  • Clustering is in F-madogram space not Euclidean

Hierarchical Clustering

Linkage Rule: For each pair of clusters, \(C_k\) and \(C_k'\) \[d(C_k, C_{k'}) = \frac{1}{|C_k| |C_{k'}|} \sum_{x_k \in C_k} \sum_{x_{k'} \in C_{k'}} d(x_k, x_{k'}).\]

Back to the first example

Classify

  • Classify a station relative to its closest neighbours
  • Use a weighted classification \(w\)-kNN

Results

Choosing a cut height

Similar Dependence




Where can we assume a common dependence structure for extremes?

Visualising Dependence

Max-stable Processes

  • Extremes in continuous space with dependence

  • Natural extension from univariate extreme value theory

  • Univariate marginal distributions are GEV distributions

  • Can simulate from these processes

Smith Process

Visualising Dependence

  • Visualise the partition using elliptical level curves

\[ \mathbb{P}(\| \mathbf{x} - \mathbf{c_k} \| < r) = 1 - \exp \left( \frac{-r^2}{2} \right)\]

  • Centre the curve on the centroid of cluster \(k\) \(\mathbf{c_k}\)
  • Repeat fitting by sampling stations to understand uncertainty
  • Size and direction of ellipses have a natural intepretation in terms dependence

Southwest Western Australia

Tasmania

Relevance to post-processing

Oesting, M., Schlather, M. and Friederichs, P., 2017. Statistical post-processing of forecasts for extremes using bivariate Brown-Resnick processes with an application to wind gusts. Extremes, 20(2), pp.309-332.

Perfect Prog Approach:
Simulate an ensemble from the fitted max-stable process

Assumptions:
The fitted statistical model is the truth

Relevance:
Need to ensure how we model the dependence is accurate

Conclusions

  • Create a regionalisation of Australia based on extremal dependence

  • Highlighted some considerations for clustering applications

  • Used the regionalisation to fit max-stable models

  • Visualised extremal dependence

  • Helped us understand were we can reasonably assume a single dependence structure

Future work

  • Non-stationary dependence!

  • Post-processing of compound events: Storm surge and Precipitation

e. K.R.Saunders@tudelft.nl

t. @katerobsau

g. github.com/katerobsau